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I. BACKGROUND OF THE INVENTION 
A. Field of the Invention 

The invention relates to recommending television shows based 

on a user profile. 



B. Related Art 



^U.S. Pat. No. 5,758,259 shows a method for identifying a 
preferred television program based on a "correlation" between the 
program and j.redetermined characteristics of a user profile. The 
term "correlc.tion" as used in the patent does not appear to 
relate to the mathematical concept of correlation, but rather is 
a very simple algorithm for assessing some similarity between a 
profile and a program. 



II. SUMMARY OF THE INVENTION 



It is ai object of the invention to improve techniques of 
automatic prsgram recommendation. 

This object is achieved by using a probabilistic 
calculation, based on a viewer profile. The probabilistic 
calculation is preferably based on Bayesian classifier theory. 

The object is further achieved by maintaining a local record 

2 
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of a viewer history. The local record is preferably 
incrementally updatable. The local record is advantageous for 
privacy reascns, and can be contrasted with methods such as 
collaborative filtering, which would require viewer history 
information to be uploaded to a central location. The use of 
incremental updates is advantageous in minimizing storage 
requirements , 

It is a still further object of the invention to improve the 
classical Bayesian classifier technique. 

In one «;mbodiment, this object is achieved by noise 
filtering. 

In another embodiment, this object is achieved by applying a 
modified Bayesian classifier technique to non-independent feature 
values . 

Further objects and advantages of the invention will be 
described in the following. 

Bayesiai classifiers are discussed in general in the text 
book Duda & 3art, Pattern Recognition and Scene Analysis (John 
Wiley & Sons 1973) . An application of Bayesian classifiers to 
document retrieval is discussed in D. Billsus & M. Pazzani, 
Learning Probabilistic User Models", http://www.dkfi.um-sb.de/~hctuer/um- 
ws/Final-Versiom/Billsus/ProbUserModels.html 
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III. BRIEF DESCRIPTION OF THE DRAWING 

The invention will now be described by way of non-limiting 
example with reference to the following drawings. 

Fig, 0 shows a system on which the invention may be used, 
5 Fig. 1 s.hows major elements of an adaptive recommender. 

Fig. 2 shows pseudo code for a viewing history generator. 
Fig. 3 shows a table of key fields, 
n Fig. 4 shows a fragment of a viewer profile. 

Fig. 5a shows a prior probability calculation. 
n|0 Fig. 5b shows a conditional probability calculation. 

Fig. 5c shows a posterior probability calculation. 

fij IV. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

J Fig 0 illustrates hardware for implementing the invention. 

rls The hardware will typically have a display 1/ some type of 

processor 2; some type of user entry device 4 connected to the 
processor via some type of connection 3; and some type of link 5 
for receiving data, such as television programming or Electronic 
Programming Guide ("EPG") data. The display 1 will commonly be a 
20 television screen, but could be any other type of display device. 
The processor 2 may be a set top box, a PC, or any other type of 
data processing device, so long as it has sufficient processing 
power. The user entry device 4 may be a remote and the 
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connection 3 may be an infrared connection. If the processor is 
a PC, the user entry device will commonly be at least plural, 
e.g. a keyboard and mouse. The user entry device may also be 
touch sensitivity on the display. The connection 5 to the 
5 outside world could be an antenna, cable, a phone line to the 

internet, a retwork connection, or any other data link. Equally 
well, connection 5 could connect to a memory device or several 

f 1 memory device's . 

Fig. 1 illustrates major elements of an embodiment of an 
r|o adaptive recommender. These elements preferably reside as 
M software and data in a medium 110 readable by a data processing 
r device such as CPU 2. The elements include a viewing history 
iij data structure 101 that gives input to profiler software 102. 
4! The profiler software in turn produces the viewer profile 103. 
I=|.5 The terms "uiser profile" and 'Viewer profile" shall be used 

interchangealDly herein. The viewer profile serves as input to 
recommender software 104. The recommender software also uses, as 
input, the EPG data structure 105, that contains features 
describing each show such as title, channel, start time and the 
20 like. An output of the recommender 104 appears on a user 
interface 106 where a user can interact with it. 

This viewer history data structure includes selected records 
from the EPG database. The EPG databases are commercially 

5 
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available, fc^r instance from Tribune Media Services. Those of 
ordinary skill in the art may devise other formats, possibly with 
finer shades of description. The selected records minimally 
correspond to TV shows watched by the viewer. It is assumed that 
5 these records have been deposited in the viewing history by 

software that, is part of the user interface and knows what shows 
the viewer has viewed. Preferably, the software would allow 
recording of a user watching more than one show in a given time 
% interval, as users often do switch back and forth during 
!-;io commercials and so forth. Preferably, also, the software records 
ri a program as watched; and whether a show was watched or whether 
'I it was taped for later viewing. 

S-H The pre'ierred viewing history format assumes the presence of 

|i both positive and negative records in the viewing history. This 
fls is needed because the goal is to learn to differentiate between 

the features of shows that are liked and those not liked. Fig. 2 
shows pseudo code for collecting the viewing history. 

Let the notation C-h denote the set of positive (i.e., 
watched) shows and C- denote the negative (i.e., not watched) 
20 shows. 

The vie/^er profile includes a number of feature value 
counts. These counts will be incremented whenever new entries 
are deposited in the viewer history. Usually, each program will 
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have several feature values* Accordingly, the deposit of a 
program in the viewer history will cause the update of counts 
associated with all feature values associated with that program. 
The incremental updatability of this type of profile is 

5 advantageous because it allows for ongoing adaptation of the 
viewer profile without a large amount of storage or computing 
effort being required. 

In addit.ion to the count of the number of positive and 
negative entries (k{C+), k(C-)), a count of occurrences of 

0 individual features will also be kept among the positive and 

negative exanples (k(filC+), k(filC-)) where fi denotes feature i 
and k(fiiC+) denotes the number of shows in set C+ that possess 
feature fi. The feature set will include entries in the EPG 
records extracted from selected key fields, an example of which 

5 is shown in Table 1 which is Fig. 3. 

A partial example of an embodiment of such counts is 
presented in table 2, shown in Fig, 4 to illustrate the idea. The 
list is presented in the Figure in six columns to save space, but 
in fact the list has only three columns, with the later part of 

0 the table being presented next to the earlier part. Each row of 
the column has four pieces of data: a feature type and a feature 
value in the first column, a positive count in the second column, 
and a negative count in the third column. The positive count 
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indicates the number of times a program having that feature value 
has been watched. The negative count indicates the number of 
times a program having that feature value has not been watched. 
A television program schedule normally includes several, if 
5 not many, programs for every time slot in every day. Normally, 
the user wil] only watch one or two of the programs in any given 
slot. If the viewer profile contains a list of ALL the programs 
n not watched, the niimber of programs not watched will far exceed 
S the number o:: programs watched. It may be desirable to create a 
3lO method for sampling the programs not watched. For instance, as 
3 the process©]: assembles the viewer profile, the processor may 
r chose a single not watched program at random from the weekly 
nj schedule as .i companion for each watched program, as suggested in 
% the pseudocoiie of Fig. 2. This design will attempt to keep the 
fis number of positive and negative entries in the viewing history 

about equal so as not to unbalance the Bayesian prior probability 
estimates, discussed below. 

It is not generally desirable to choose a companion program 
from the same time slot as a watched program. Experiment has 
20 shown that the combined time and day feature value is typically 
the strongest or one of the strongest predictors of whether a 
particular program will be preferred. Thus another program at 
the same time as the watched program may well be a second or 

8 
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third choice program, while a program at a totally different time 
may be very undesirable. Accordingly, it is preferred to 

choose the companion program at random from the program schedule 
of the entire week that includes the watched program. 

Since time and day feature values for a program are often so 
important in determining whether a program will be of interest to 
a user, it is typically undesirable to consider two programs of 
identical cor tent to be the same if they are shown on different 
days and/or at different times. In other words a particular 
episode of a series may be strongly preferred if it is shown at 8 
p.m. on Tuesc.ay, while the same episode of the same series may be 
completely urdesirable if it is shown at 10 a.m. on Monday. Thus 
the episode at 10 a.m. should be considered a different program 
from the episode at 8 p.m., even though the content of the two 
are identical . 

As more and more shows are viewed, the length of the profile 
will tend to grow larger and larger. To combat this, and to keep 
the focus on features that are effective discriminators, the 
following are; recommended: 

periodic reviews of the features in the viewer profile, 

and 

renoval of words that appear to be frequent and not 
very discriminating. 
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In general, those of simple tastes, e.g* those who only like 
to watch football, will be fairly easy to recoiamend for after 
taking of a viewer history for a relatively short time. For 
those of more complex preferences, it will take longer for the 
viewer history to be sufficiently meaningful to make good 
recommendations. These latter people, however, are those who are 
probably most in need of a recommendation. 

In the final analysis, viewer histories will always be 
axnbiguous. Recommendations of shows based on such histories will 
always contain a margin for error. The recommendations can at 
best be said to have some probability of being correct. 
Therefore probabilistic calculations are useful in analyzing 
viewer profile data to make recommendations. 

The preferred embodiment of the recommender uses a simple 
Bayesian classifier using prior and conditional probability 
estimates derived from the viewer profile. How recommendations 
are shown to viewers is not defined here, yet it will be assumed 
that one can capture the viewer' s response to them, at least 
observing whether or not they were watched. 
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Below will be discussed a 2-class Bayesian decision model. 
The two classes of TV shows of interest are: 



CI - shows that interest the viewer 

C2 - shows that do not interest the viewer 

Other classes might be used showing more shades of interest or 
lack thereof. 

In contrast with the classes of interest listed above, 
viewing histcry contains information only on the classes: 

C-h - shows the viewer watched 

C- - shows the viewer did not watch 

Determining which shows a user watched or did not watch is 
outside of the scope of this application. The user might enter a 
manual log o:: which shows s/he watched. Alternatively, hardware 
might record the user's watching behavior. Those of ordinary 
skill in the art might devise numerous techniques for this. It 
should be possible to consider shows as watched even if they are 
watched only for a short time, as a user may be switching back 
and forth be:ween several shows, trying to keep track of all of 

11 
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them* 

Inferenciss may be made about classes CI and C2 based on 
observations, but these inferences will always contain an element 
of uncertaint/. The Bayesian model will compute the prior 
5 probabilities P(C-^) and P(C~) directly from the counts in the 
viewer prof lis in accordance with Fig. 5a. In other words, the 
assiomption will be that shows not watched are generally those the 
viewer is not interested in, and that the shows watched are the 
ones that the viewer is interested in. 

m The conditional probabilities, that a given feature, jfi, 

H will be present if a show is in class C-f or are then computed 

in accordance with Fig. 5b. These calculations can be performed 

ftj once a day during times that the TV is not being viewed and 
stored in the viewer profile. 

fflS Recommendations for upcoming shows can be computed by 

estimating the posterior probabilities, i.e. the probability that 
a show is in class C+ and C- given its features. Let x he a 
binary vectoi (xl, x2, ...^xi, ...^ xn) where i indexes over the 
features in the viewer profile and where xi = 1 if feature fi is 

20 present in the show being considered for recommendation and 0 if 
not. For the exclusive features, like day, time, and station, 
where every show must have one and only one value, the index i 
will be taken to indicate the value present in the show being 

12 
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considered provided that this value is also present in the 
profile. Otherwise, novel exclusive features will not enter into 
the calculations. For non-exclusive features, the index i will 
range over alL values present in the profile; non-exclusive 
5 features novel to the considered show will not contribute to the 
calculations. The posterior probabilities are estimated in 
accordance with Fig. 5c 

With these estimates in hand, a show will generally be 
|5 recommended if P(C+\x) > P(C-\x) and the ^^strength" of the 

110 recommendation will be proportional to P(C-^\x) - P(C'\x). One 
H potential problem with this scheme is that some conditional 

probabilities are likely to be zero. Any zero in a chain 

111 multiplication will reduce the result to zero so some means for 
eliminating zeros is needed. The Billsus and Pazzani article 

ilS referenced afcove presents a couple of schemes, including simply 
inserting a small constant for any zeros that occur. 

One metlrod for dealing with zeroes in the conditional 
probability multiplication chain would be as follows. One can 
choose a heuristic of 1000. If the number of shows in the 
20 viewing history is less than 1000, then the value of 1/1000 can 
be substituted for zero. If the number of shows in the viewing 
history is gjreater than 1000, the correction can be 
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+1 

k+ +2 

Where 

ki+ is the number of watched shows having feature i 

k+ is the total number of watched shows. 
This is what Ls called the Laplace correction in the Billsus and 
Pazzani article. This Laplace correction must also be done for 
the not watched shows. 

Alternative schemes may be devised by those of ordinary 

skill in the art. 

Classical Bayesian theory would require the use of all 
accumulated elements of the list of Fig. 4 in making a 
recommendation. Nevertheless, in some instances it may be useful 
to use a noise cutoff, eliminating features from consideration if 
insufficient data about them appears in the list. For instance 
if a particularly feature did not appear in more than some given 
percentage of shows considered, whether in negative or in 
positive cour.t, it might be ignored in determining which 
recoiraaendation to make. Experimentally it was found that a 
cutoff of 5% was far too large. 

Rather than use a percentage, one embodiment of the noise 
cutoff would use the viewer profile itself to determine the 
cutoff. This embodiment would first take a subset, or sub-list, 
of the viewe;: profile relating to particular feature types. For 

14 
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instance^ a sab-list might advantageously comprise all of the 
elements of the viewer profile relating to the feature types: 
time of day and day of the week. Alternatively, in another 
example, the sub-list might advantageously comprise all of the 
elements of the viewer profile relating to channel number. 
Generally the feature type or types chosen should be independent 
feature types, in other words feature types which do not require 
another feature type to be meaningful. 

The sub-list is then sorted by negative count, i.e. by 
number of shows having a particular feature value and not being 
watched. The highest negative count in this sorted list can be 
viewed as the noise level. In other words, since, in the 
preferred embodiment, the ^'not watched'' shows are chosen at 
random from the week's program schedule, counts as large as the 
noise level can occur by chance, and therefore should be ignored. 

Thus an^' feature having both a positive and a negative count 
at or below the noise level need not be considered in the 
Bayesian calculation in making a recommendation. This example of 
noise level thresholding has used a particular feature, e.g. 
day/ time as one for determining noise cutoff. In general, any 
feature that is uniformly randomly sampled by the negative 
example samp].ing procedure may be chosen by those of ordinary 
skill in the art for the calculation of the noise threshold. 
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The calculations of Figs. 5a-c are advantageous in that they 
require fairly low computing power to complete and are therefore 
readily adaptable to modest hardware such as would be found in a 
set top box. 

^^Surprise Me'''' Feature 

Recommendations according to the above-described scheme will 
be programs having a preponderance of features that are present 
in shows that have been watched. The viewer profiles accumulated 
will not yield any meaningful recommendations with respect to 
shows having few features in common with those features that are 
in the watched and not watched shows and register above the noise 
level. Accordingly, optionally, the recommender may occasionally 
recommend shews at random, in a "surprise me" feature. The 
surprise me feature would recommend programs with relatively few 
features in common with watched and not watched shows, to the 
extent that such features register above the noise level. 

Using the user profile in other domains 

Once a user profile is developed, the recommendation 
techniques of the invention might be used to recommend other 
types of items such as movies, books, audio recordings, or even 
promotional materials such as tee shirts or posters. 
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Non-independe:ice of Features 

The classical assumption in the domain of Bayesian 
classifier theory is that all features are independent. 
5 Therefore, if a features is, say, often present in positive 
shows, but is missing from a show being considered for 
recommendation, the fact should count against the show. However, 
this may yield undesirable results for the current application. 
For example, let us assume that there are five day/time 
iiijo slots indicated in the user profile as being most watched. Let 
us assume further that a particular show being evaluated falls 
within one of those five slots. The calculation of Fig. 5c would 
I;r= then give rise to an increase in probability for the day/ time 
y slot that matches and a decrease for the four day/time slots that 
5^5 do not match. Intuitively, it appears that the latter decrease 
is not reasorably related to an accurate determination of 
probability ior the show in question. The different values of 
day/time are not independent — as every show has one and only 
one value, so the values a show does not have should not count 
20 against it. 

To remedy this deficiency in the classical Bayesian 
approach, it is proposed to designate features into two types: 
set 1 and set 2. If a feature is designated set 1, the Bayesian 
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calculation will ignore any non-matching values of the feature. 
If the featuriB is designated set 2, then the normal Bayesian 
calculation, iDer Fig* 5c, will be done. 

Normally in a television application set 1 would include 
day/time; station; and title- Some features which have values 
only for a ferJ shows, e.g. critic ratings, should also be set 1, 
because too many shows would be non-matching merely because 
critics tend to rate only a tiny percentage of shows. 

Set 2, for television shows, would normally include all 
features that can have several values per show, such as actor. 

From reading the present disclosure, other modifications 
will be apparent to persons skilled in the art. Such 
modifications may involve other features which are already known 
in the design, manufacture and use of television interfaces and 
which may be used instead of or in addition to features already 
described herein. Although claims have been formulated in this 
application to particular combinations of features, it should be 
understood tt.at the scope of the disclosure of the present 
application a.lso includes any novel feature or novel combination 
of features disclosed herein either explicitly or implicitly or 
any generalis:ation thereof, whether or not it mitigates any or 
all of the same technical problems as does the present invention. 
The applicants hereby give notice that new claims may be 
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formulated to such features during the prosecution of the present 
application or any further application derived therefrom. 

The word "comprising", "comprise", or "comprises" as used 
herein should not be viewed as excluding additional elements. 
The singular article "a" or "an" as used herein should not be 
viewed as excluding a plurality of elements. 
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V. CLAIMS 



1 1* A data processing device comprising: 

2 - at least one input for receiving data including 

3 - viewer profile data; and 

4 - data regarding a television program; and 

5 - a processor, the processor being adapted to perform the 
□6 following 

- calculating a probability that the television program 

ffls is a desired one; and 

7=9 - supplying a recommendation regarding the television 

;;10 program based on the probability. 



%1 2. The data processing device of claim 1 wherein the input is 
^"^2 coupled with a medium readable by the data processing device. 

1 3» The data f^rocessing device of claim 2, wherein the medium 

2 embodies t.he viewer profile. 

1 4. The data processing device of claim 3, wherein 

2 > the medium is local to the data processing device and 

3 > the viewer profile is arranged so as to be incrementally 
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1 5. The data processing device of claim 3, wherein the processor 

2 maintains zhe viewer profile in accordance with a data 

3 structure comprising: 

4 > a list of feature values; and 

5 > for each element of the list, a respective niomber of times 
fi5 programs having that feature value were watched. 

^ 6. The data processing device of claim 5, wherein the data 

S2 structure further comprises, for each element of the list, a 

I 3 respective nuiiber of times programs having that feature value 

ffj4 were not watched. 

ill 7. The data processing device of claim 6, wherein the processor 

2 is further arranged to perform the following, each time a 

3 user watches a new program, 

4 > first atiding, to the list, feature values or counts of such 

5 feature values, associated with that new program; 

6 > selecting at least one companion program to the new 

7 program, the companion program being selected at random from 

8 a program schedule, which companion program has not been 

9 watched; and 

21 
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10 > second adding, to the list, feature values of the companion 

11 program^ or counts of such feature values. 

1 8. The data ]Drocessing device of claim 5, wherein the processor 

2 is furthe : arranged to perform the following, each time a 

3 user watclies a new program: first adding, to the list, 

4 feature values or counts of such feature values, associated 

5 with that new program; 

i 9. The data ;Drocessing device of claim 2, wherein the medium 
] embodies :he data regarding the television program. 

t 10. The data processing device of claim 1, wherein the input is a 
I network cDnnection. 

11. The data processing device of claim 1, wherein calculating 
comprises using a Bayesian classifier. 

12. The data processing device of claim 11, wherein the 
processor is further adapted to subject the viewer profile to 
a noise threshold calculation prior to using the Bayesian 
classifier. 
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1 13, The data processing device of claim 12, wherein 

2 > the viewer profile data comprises 

3 > a list of feature values; 

4 > a respective negative count for each element of the list, 

5 the negative count indicating a number of times programs 

6 having that feature value have not been watched; 

7 > a respective positive count for each element of the list, 
(IB the positive count indicating a number of times programs 
JS having that feature value have been watched; 

ib > the noise threshold calculation comprises 

it > selectinc a sub-list comprising at least feature values 

%2 having at least one specific type of feature; 

iljs > choosing the highest negative count in the sub-list as the 
noise threshold; 

''fS the recommendation comprises a program selected from a group 

16 having at least one feature value having a positive or negative 

17 count in the viewer profile, which count exceeds the noise 

18 threshold, 

1 14. The data processing device of claim 12, wherein subjecting 

2 the viewer profile to the noise threshold further comprises using 

3 observations gathered by a known random process to estimate a 

4 reasonable ncise threshold. 

23 
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15. The data processing device of claim 13, wherein the specific 
type comprises a day and time of day feature type. 

16. The data processing device of claim 13, wherein the specific 
type comprises a station identification feature type. 

17. The data processing device of claim 1, wherein the viewer 
profile data comprises a plurality of respective counts of 
programs watched, each respective count indicating how many 
programs watched had a respective feature. 
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1 18 •The data processing device of claim 17, wherein calculating 

2 comprises calculating a probability that the television 

3 program is in a particular class. 
4 

1 19, The data processing device of claim 18, wherein the class is 

2 one of 

3 > programs the viewer is interested in, and 
;,.A > programs the viewer is not interested in. 



'f^ 20, The data processing device of claim 1^ wherein calculating 
l:j2 the probab:.lity comprises: 

3 - computing a prior possibility, of whether a program is 
nj4 desired or not; 

.p5 - computing a conditional probability of whether a feature fi 
□6 will be present if a show is desired or not; and 

7 - computing a posterior probability of whether program is 

8 desired or not, based on the conditional probability and the 

9 prior probability. 

1 21. The data processing device of claim 1, wherein it is assumed 

2 that programs watched are programs that the viewer is 

3 interested in. 
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1 22, The data processing device of claim 1^ wherein the processor 

2 is further adapted to provide a recommendation regarding an 

3 additional item, other than a television program, based on the 

4 viewer profile, 

1 23. The data processing device of claim 1, wherein the processor 
_ 2 is furthei: adapted to occasionally recommend a surprise show 

f f 3 that has jrelatively few features in common with watched shows. 

nil 24. The data processing device of claim 1, wherein 

2 > the viewer profile comprises a list of features types and 

3 values for such feature types; 

4 > the feature types are selected from at least two sets, 
% 5 including 

6 > a first set of feature types whose values are deemed non- 

7 independent; and 

8 > a second set of feature types whose values are deemed 

9 independent; and 

10 > calculatirg a probability comprises 

11 > applying a Bayesian classifier calculation corresponding to 

12 feature i:ypes from the second set; and 

13 > applying a modified Bayesian classifier calculation 
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corresponcing to feature types from the first set. 



1 25. The data processing device of claim 24, wherein 

2 > with respect to features of the first set, the modified 

3 Bayesian classifier calculation considers only feature values 

4 that match vrith a show being classified, 

,1 26. At least one medium readable by a data processing device and 
'II embodying software arranged to perform the following 

operations ; 

- cal<:ulating a probability that a television program is 
'''^5 a d<3sired one, based on a viewer profile and data 

regarding the television program; and 
1f7 - sup]Dlying a recommendation regarding the television 

pfe program based on the probability- 



27. The at Isast one medium of claim 26, wherein the at least 
one mediiam further embodies the viewer profile. 



28. The at least one medium of claim 27, wherein the viewer 
profile is arranged so as to be incrementally updatable. 



1 29. The at least one medivim of claim 27, wherein the viewer 
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2 profile is embodied as a data structure comprising: 

3 > a list of feature values; and 

4 > for each element of the list, a respective number of times 

5 programs having that feature value were watched, 

1 30. The at least one medium of claim 29, wherein the data 

2 structure further comprises, for each element of the list, a 
respective number of times programs having that feature value 

Jj4 were not watched. 

31. The at least one medium of claim 30, wherein the software is 
'2 further arranged to perform the following, each time a user 

watches a rew program, 
'Pi > first acding, to the list, feature values or counts of such 

J-^ feature values, associated with that new program; 

6 > selecting at least one companion program to the new . 

7 program, the companion program being selected at random from 

8 a prograia schedule, which companion program has not been 

9 watched; and 

10 > second c.dding, to the list, feature values of the companion 

11 program, or counts of such feature values. 



1 



32. The at least one medium of claim 29, wherein the software 
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2 is further arranged to perform the following, each time a user 

3 watches a new program: first adding, to the list, feature 

4 values or counts of such feature values, associated with that 

5 new program. 

The at l€!ast one medium of claim 26, wherein the at least 
one mediiia embodies the data regarding the television 
program. 

The at least one medium of claim 26, wherein calculating 
comprises using a Bayesian classifier. 

The at least one medium of claim 34, wherein the software is 
further adapted to subject the viewer profile to a noise 
threshold calculation prior to using the Bayesian 
classifier. 

The at least one medium of claim 35, wherein 

2 > the viewer profile data comprises 

3 > a list o:: feature values; 

4 > a respec':ive negative count for each element of the list, 

5 the negative count indicating a number of times programs 

6 having that feature value have not been watched; 
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> a respective positive count for each element of the list, 
the positive count indicating a number of times programs 
having that feature value have been watched; 

> the noise :hreshold calculation comprises 

> selectirg a sub-list comprising at least feature values 
having at least one specific type of feature; 

> choosing the highest negative count in the sub-list as the 
noise threshold; 

> the recommandation comprises a program selected from a group 
having at least one feature value having a positive or negative 
count in tie viewer profile exceeding the noise threshold. 

37. The data processing device of claim 35, wherein subjecting 
the vievrer profile to the noise threshold further comprises 
using observations gathered by a known random process to 
estimate a reasonable noise threshold, 

38. The at least one medium of claim 36, wherein the specific 
type conprises a day and time of day feature type, 

39. The at least one medium of claim 36, wherein the specific 
type comprises a station identification feature type. 



S:\TH\700690,DOC 



30 



February 4, 2000 (3:52PM) 



ID 700690 



1 40 • The at least one medium of claim 26, wherein the viewer 

2 profile data comprises a plurality of respective counts of 

3 programs watched, each respective count indicating how many 

4 programs watched had a respective feature, 

1 41. The at least one medium of claim 40, wherein calculating 

2 comprises calculating a probability that the television 
% program is in a particular class, 

H 42. The at leiast one medium of claim 40, wherein the class 

H. comprises at least one of programs the viewer is interested 

Q in and pi'ograms the viewer is not interested in. 

Cl 43. The at least one medium of claim 2 6, wherein calculating the 

2 probabili.ty comprises: 

3 - computing a prior possibility, of whether a program is 

4 desired or not; 

5 - computincj a conditional probability of whether a feature fi 

6 will be i^resent if a show is desired; and 

7 - computin<j a posterior probability of whether program is 

8 desired or not, based on the conditional probability and the 

9 prior probability. 
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1 44 ♦ The at least one medium of claim 2 6, wherein it is assumed 

2 that programs watched are programs that the viewer is 

3 interested in. 



1 45, The medium of claim 26, wherein the software is further 

2 arranged to provide a recommendation regarding an additional 
item, other than a television program, based on the viewer 

% prof ile* 

'1 46. The at least one medium of claim 2 6, wherein the software is 

''2 further arranged to occasionally recommend a surprise show 

;:;^3 that has relatively few features in common with watched show. 

47. The at least one medium of claim 26, wherein 

2 > the viewer profile comprises a list of features types and 

3 values for such feature types; 

4 > the feature types are selected from at least two sets, 

5 including 

6 > a first set of feature types whose values are deemed non- 

7 independent ; and 

8 > a second set of feature types whose values are deemed 

9 independent; and 
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10 > calculating a probability comprises 

11 > applying a Bayesian classifier calculation corresponding to 

12 feature types from the second set; and 

13 > applying a modified Bayesian classifier calculation 

14 corresponding to feature types from the first set- 

1 48, The at le^ast one medium of claim 47, wherein with respect to 

f-==;2 features of t.he first set, the modified Bayesian classifier 

'p3 calculation c:onsiders only values that match with a show being 

j=J^4 classified. 

1 49. A computer method comprising performing the following 
^{2 operations in a data processing device: 

,p3 - Receiving a set of data; 

□4 - Filtering the data in accordance with a noise criterion; 

5 - Drawing a conclusion from the filtered data based on a 

6 Bayesian classifier calculation; and 

7 - Presenting the conclusion to a user. 

1 50. The method of claim 49 wherein the noise criterion is based 

2 on a frequency of instances of a particular types of data 

3 within set, which types are believed to likely represent 

4 noise, 
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1 51* A data pi^ocessing method comprising performing the following 

2 operations in a data processing device: 

3 - first receiving data reflecting physical observations, which 

4 data includes a list of feature values and observations 

5 about feature values, some of which feature values are 

6 independent and some of which are not; 

r? - second receiving data about an item to be classified, the 
jj| data about the item to be classified including feature 

jjJl values; 

if) - maintaining a division of the data reflecting physical 

11 observations into at least two sets, including 

12 - a first set including those feature values which are 

13 deemed not independent; and 

'%% " a second set including those feature values which are 

15 deemed independent; 

16 - performi]ig a probabilistic calculation on the data 

17 reflecting physical observations and the data regarding an 

18 item to l^e classified including: 

19 - applying a Bayesian classifier calculation with respect 

20 to feature values relating to the second set; and 

21 - applying a modified Bayesian classifier calculation 
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with respect to feature values relating to the first 
set 

presenting a conclusion regarding the item to be classified 
to a user based on the probabilistic calculation • 

52. The method of claim 51, wherein the modified Bayesian 
classifier calculation comprises ignoring feature values from the 
data reflecting physical observations when those feature values 
are not present in the data regarding the item to be classified . 
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VI. ABSTRACT OF THE DISCLOSURE 



ID 700690 



A systen for recommending television programs makes use of 
probabilistic calculations and a viewer profile to create a 
recommendaticn. The probabilistic calculations preferably are in 
the form of Eayesian classifier theory. Modifications to 
classical Ba^esian classifier theory are proposed. 
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FIG. 1 



Let S be all shews not watched in the 7 days prior to and including the 

current day 
Let N be 1. 

For each TV show watched 

Enter the watched show in the viewing history as a positive example 
Select a subset S of shows not watched 

Select at -andom N shows from set S and enter them in the viewing 
history as negative examples, if an explicit viewer profile is available, 
then the random selection can be biased away from shows "liked" 
and towards shows "not liked." 



Figure 2. Ps<iudo code for the Viewing history generator 
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FIG. 5A 



T =: k(C+) -h k(C~) 
P(C+} = k(C+)/T 
P(C-) = k(C-)/T 



FIG. 5B 



P(iilC+) = k(filC+}/k(C+) 
P(filC-) = k(filC-)/k(C-) 



FIG. 5C 

P(C+lx) - P(xlC+)P(C+)/P(x) 
P(C'lx) = P(xlC-)P(C')/P(x) 



Where 



P{x) = PO<lC-h)P(C-i-) + P(xlC-)P(C-) 

P(x I C+) = Q P(fi I C+) " (1-P{fi\ c+)) 
1=1 

n= numb€ir of features in profile 
fi= the l'^ feature in the profile 

s a bit string of length n, where the 1*^ bit indicates the 
^ = {0,1}" presence (1) or absence (0) of feature fi in the program 
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